AITopics | activation pattern

Hierarchical Spatio-Channel Clustering for Efficient Model Compression in Medical Image Analysis

Hamlomo, Sisipho, Atemkeng, Marcellin, Likassa, Habte Tadesse, Ravelo, Blaise, Bouwmans, Thierry, Lalléchère, Sébastien, Vacavant, Antoine, Chen, Ding-Geng

arXiv.org Machine LearningApr-28-2026

Convolutional neural networks (CNNs) have become increasingly difficult to deploy in resource-constrained environments due to their large memory and computational requirements. Although low-rank compression methods can reduce this burden, most existing approaches compress spatial and channel redundancy independently and therefore do not fully exploit the localised structure within convolutional feature maps. This paper proposes a hierarchical spatio-channel low-rank compression framework for CNNs that exploits redundancy across spatial regions and channel activations. Unlike conventional methods, which apply a uniform decomposition across an entire layer, the proposed approach first partitions feature maps into spatial regions, then groups channels according to their co-activation patterns within each region, and finally applies rank-adaptive SVD to each resulting spatio-channel cluster. The method is evaluated on an AlexNet-based brain tumour MRI classification model and compared with Global SVD and Tucker decomposition under \(3\times\) and \(6\times\) compression budgets. Our method outperforms both baselines, reducing FLOPs from \(8.21\,\mathrm{G}\) to \(1.55\,\mathrm{G}\) (\(81.1\%\) reduction), achieving a \(1.38\times\) inference speed-up, and increasing classification accuracy from \(87.76\%\) to \(89.80\%\). The method also improves the macro \(F_1\)-score and performance on challenging classes such as meningioma. A hyper-parameter trade-off analysis demonstrates that the framework provides Pareto-optimal configurations, enabling control over the balance between compression and predictive performance. Moderate clustering with adaptive rank selection yields strong results. Bootstrap standard errors are reported for all classification metrics.

artificial intelligence, decomposition, machine learning, (19 more...)

arXiv.org Machine Learning

2604.23375

Country:

North America > United States (0.93)
Africa (0.68)

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

Add feedback

Details

Neural Information Processing SystemsApr-24-2026, 09:50:56 GMT

Here we derive Equation 8 for 0 and out = > 0. Since ESN(µ, 2,0) = NR(µ,), we can obtain Equation 4 for ID activation by specializing the result to =0 . We begin with a useful lemma. Let X ESN(0, 2,) and let a b 0, 0 c d. Then P(a X b)= (1+) h The result for P(c X d) follows analogously. For the reader's convenience, we summarize in detail a few common techniques for defining OOD scores that measure the degree of ID-ness on the given sample. All the methods derive the score post hoc on neural networks trained with in-distribution data only.

activation, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Yuanzhi Li, Yingyu Liang

Neural Information Processing SystemsMar-15-2026, 09:13:34 GMT

Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.

artificial intelligence, initialization, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Deep Contract Design via Discontinuous Networks

Neural Information Processing SystemsFeb-17-2026, 05:16:13 GMT

Contract theory studies the setting where a principal seeks to design a contract for rewarding an agent on the basis of the uncertain outcomes caused by the agent's private actions [

artificial intelligence, contract, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Asia > Middle East > Israel (0.04)
North America > United States (0.04)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

7016d7b7b6e3c05b2128ac5b3aae492d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 02:06:09 GMT

artificial intelligence, machine learning, pt ii, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.27)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

incorporate feedback into our final revision. 4 [R1]: " I don't exactly see if small batch vs large batch captures this phenomenon; if yes should say explicitly. "

Neural Information Processing SystemsFeb-13-2026, 21:02:44 GMT

We thank the reviewers for the detailed and insightful reviews. As the reviews noted, our work 1) introduces "novel Smith et al. [2017] make an explicit connection between small vs. large batch "A small discussion on if the phenomenon has been observed for different datasets/tasks with different optimizers" The phenomenon may not be true for other optimizers such as Adam, though. "concept of "memorizable and generalizable", though intuitive, is sketchy and not formally explained ... authors We acknowledge that the terms "memorizable" and "generalizable" are potentially confusing. We will revise our terminology to clarify this distinction. By "inherently noisy", we refer to the fact that high noise in the datapoints will necessitate larger sample complexity.

artificial intelligence, machine learning, noise, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Deep ReLU Networks Have Surprisingly Few Activation Patterns

Boris Hanin, David Rolnick

Neural Information Processing SystemsFeb-13-2026, 01:55:55 GMT

Neural Information Processing Systems http://nips.cc/

activation region, neuron, theorem 5, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Yuanzhi Li, Yingyu Liang

Neural Information Processing SystemsFeb-12-2026, 20:46:45 GMT

Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.

artificial intelligence, initialization, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

602f5c1b803c53b2aaf0b3864bf3383a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 13:56:44 GMT

artificial intelligence, inequality, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.27)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

AQuantitativeGeometricApproachto Neural-NetworkSmoothness

Neural Information Processing SystemsFeb-12-2026, 08:11:57 GMT

The techniques for theℓ2-perturbation do not easily transfer to theℓ -case (Leino et al., 2021). One critical step is to measure the Lipschitz constant more precisely and efficiently.

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
North America > Canada > Quebec > Montreal (0.04)
(11 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Filters

Collaborating Authors

activation pattern

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Hierarchical Spatio-Channel Clustering for Efficient Model Compression in Medical Image Analysis

Details

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Deep Contract Design via Discontinuous Networks

7016d7b7b6e3c05b2128ac5b3aae492d-Paper-Conference.pdf

incorporate feedback into our final revision. 4 [R1]: " I don't exactly see if small batch vs large batch captures this phenomenon; if yes should say explicitly. "

Deep ReLU Networks Have Surprisingly Few Activation Patterns

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

602f5c1b803c53b2aaf0b3864bf3383a-Paper-Conference.pdf

AQuantitativeGeometricApproachto Neural-NetworkSmoothness